AITopics | in-context learning

Collaborating Authors

in-context learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ghost in the Kernel: In-Context Learning with Efficient Transformers via Domain Generalization

Liu, Peilin, Zhou, Ding-Xuan

arXiv.org Machine LearningJul-2-2026

Transformer-based large models have demonstrated remarkable generalization abilities across different tasks by leveraging a context-aware attention module for in-context learning. With richer context, transformers adapt more effectively to the current use case without any parameter updates. However, the quadratic computational and memory complexity with respect to context length significantly slows data processing in softmax transformers. Linear transformers were proposed to address this issue by reducing the complexity to linear dependence on context length, but the design and understanding of the feature mapping in linear attention, from a theoretical viewpoint, remain unclear. In this paper, we investigate the approximation and generalization abilities of linear transformers under a two-staged sampling process from domain generalization. We show that linear transformers perform in-context learning as learning a mapping from context distributions to response functions. A dimension-independent convergence rate is obtained for our generalization analysis, which also exhibits the tradeoff between the regularities of data distributions and latent features. Guided by our theoretical framework, we propose a new perspective on activation and loss design for linearizing pretrained softmax large language models.

large language model, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2607.00479

Country: Europe (0.28)

Genre: Research Report (0.63)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Add feedback

In-Context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-separation

Neural Information Processing SystemsJun-23-2026, 03:23:41 GMT

This paper investigates approximation-theoretic aspects of the in-context learning capability of the transformers in representing a family of noisy linear dynamical systems. Our first theoretical result establishes an upper bound on the approximation error of multi-layer transformers with respect to an L2-testing loss uniformly defined across tasks. This result demonstrates that transformers with logarithmic depth can achieve error bounds comparable with those of the least-squares estimator. In contrast, our second result establishes a non-diminishing lower bound on the approximation error for a class of single-layer linear transformers, which suggests a depth-separation phenomenon for transformers in the in-context learning of dynamical systems.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
(4 more...)

Add feedback

Understanding Prompt Tuning and In-Context Learning via Meta-Learning

Neural Information Processing SystemsJun-23-2026, 02:49:22 GMT

Prompting is one of the main ways to adapt a pretrained model to target tasks. Besides manually constructing prompts, many prompt optimization methods have been proposed in the literature. Method development is mainly empirically driven, with less emphasis on a conceptual understanding of prompting. In this paper we discuss how optimal prompting can be understood through a Bayesian view, which also implies some fundamental limitations of prompting that can only be overcome by tuning weights. The paper explains in detail how meta-trained neural networks behave as Bayesian predictors over the pretraining distribution, whose hallmark feature is rapid in-context adaptation. Optimal prompting can be studied formally as conditioning these Bayesian predictors, yielding criteria for target tasks where optimal prompting is and is not possible. We support the theory with educational experiments on LSTMs and Transformers, where we compare different versions of prefix-tuning and different weight-tuning methods. We also confirm that soft prefixes, which are sequences of real-valued vectors outside the token alphabet, can lead to very effective prompts for trained and even untrained networks by manipulating activations in ways that are not achievable by hard tokens. This adds an important mechanistic aspect beyond the conceptual Bayesian theory.

artificial intelligence, machine learning, predictor, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Learningto Rank for In-Context Example Retrieval

Neural Information Processing SystemsJun-22-2026, 20:07:56 GMT

Recent advances in retrieval-based in-context learning (ICL) train the retriever using a classification objective, which categorizes in-context examples (ICEs) into the most useful and the rest based on absolute scores. However, during inference, ICEs are retrieved by score ranking rather than classification -- The classification training objective deviates from this test scenario. Hence, in this paper, we propose a novel algorithm that trains a retrieval model by ranking formulation, where the preference rankings between ICEs are given by comparing the likelihood of the LLM generating the correct answer conditioned on each exemplar. By learning to rank, we motivate the retriever to automatically learn diverse rationales why specific examples are more useful for ICL decisions. This addresses the issue that classification models poorly capture broader utility. Experimental results demonstrate the top-1 performance of our proposal across 9 NLP tasks, with ablation studies and case studies further validating the effectiveness of our design.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States > New Jersey (0.28)
North America > United States > Massachusetts (0.28)
North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment (0.67)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs

Neural Information Processing SystemsJun-21-2026, 17:36:26 GMT

Pretrained Transformers demonstrate remarkable in-context learning (ICL) capabilities, enabling them to adapt to new tasks from demonstrations without parameter updates. However, theoretical studies often rely on simplified architectures (e.g., omitting MLPs), plain data models (e.g., linear regression with isotropic inputs), and single-source training--limiting their relevance to realistic settings. In this work, we study ICL in pretrained Transformers with nonlinear MLP heads on nonlinear tasks drawn from multiple data sources with heterogeneous input, task, and noise distributions. We analyze a model where the MLP comprises two layers, with the first layer trained via a single gradient step and the second layer fully optimized. Under high-dimensional asymptotics, we prove that such models are equivalent in ICL error to structured polynomial predictors, leveraging results from the theory of Gaussian universality and orthogonal polynomials. This equivalence reveals that nonlinear MLPs meaningfully enhance ICL performance--particularly on nonlinear tasks--compared to linear baselines.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

In-Context Learning Strategies Emerge Rationally

Neural Information Processing SystemsJun-21-2026, 15:24:05 GMT

Recent work analyzing in-context learning (ICL) has identified a broad set of strategies that describe model behavior in different experimental conditions. We aim to unify these findings by asking why a model learns these disparate strategies in the first place. Specifically, we start with the observation that when trained to learn a mixture of tasks, as is popular in the literature, the strategies learned by a model for performing ICL can be captured by a family of Bayesian predictors: a memorizing predictor, which assumes a discrete prior on the set of seen tasks, and a generalizing predictor, where the prior matches the underlying task distribution. Adopting the normative lens of rational analysis, where a learner's behavior is explained as an optimal adaptation to data given computational constraints, we develop a hierarchical Bayesian framework that almost perfectly predicts Transformer nexttoken predictions throughout training--without assuming access to its weights. Under this framework, pretraining is viewed as a process of updating the posterior probability of different strategies, and inference-time behavior as a posteriorweighted average over these strategies' predictions. Our framework draws on common assumptions about neural network learning dynamics, which make explicit a tradeoff between loss and complexity among candidate strategies: beyond how well it explains the data, a model's preference towards implementing a strategy is dictated by its complexity. This helps explain well-known ICL phenomena, while offering novel predictions: e.g., we show a superlinear trend in the timescale for transitioning from generalization to memorization as task diversity increases. Overall, our work advances an explanatory and predictive account of ICL grounded in tradeoffs between strategy loss and complexity.

large language model, machine learning, predictor, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.27)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.92)

Add feedback

Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context

Neural Information Processing SystemsJun-21-2026, 08:22:44 GMT

Transformers have demonstrated remarkable in-context learning (ICL) capabilities, adapting to new tasks by simply conditioning on demonstrations without parameter updates. Compelling empirical and theoretical evidence suggests that ICL, as a general-purpose learner, could outperform task-specific models. However, it remains unclear to what extent the transformers optimally learn in-context compared to principled learning algorithms. To investigate this, we employ a meta ICL framework in which each prompt defines a distinctive regression task whose target function is drawn from a hierarchical distribution, requiring inference over both the latent model class and task-specific parameters.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions

Neural Information Processing SystemsJun-18-2026, 18:23:53 GMT

Transformer architectures can solve unseen tasks based on input-output pairs in a given prompt due to in-context learning (ICL). Existing theoretical studies on ICL have mainly focused on linear regression tasks, often with i.i.d.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

CCL: Causal-aware In-context Learning for Out-of-Distribution Generalization

Neural Information Processing SystemsJun-18-2026, 17:30:55 GMT

In-context learning (ICL), a nonparametric learning method based on the knowledge of demonstration sets, has become a de facto standard for large language models (LLMs). The primary goal of ICL is to select valuable demonstration sets to enhance the performance of LLMs. Traditional ICL methods choose demonstration sets that share similar features with a given query. However, our experiments reveal that these traditional ICL approaches perform poorly on out-of-distribution (OOD) datasets, where the demonstration set and the query originate from different distributions. To ensure robust performance in OOD datasets, it is essential to learn causal representations that remain invariant between the source and target datasets. Inspired by causal representation learning, we propose causal-aware in-context learning (CCL). CCL captures the causal representations of a given dataset and selects demonstration sets that share similar causal features with the query. To achieve this, CCL employs a novel VAE-based causal representation learning technique. We demonstrate that CCL improves the OOD generalization performance of LLMs both theoretically and empirically.

demonstration, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Transformers are almost optimal metalearners for linear classification

Neural Information Processing SystemsJun-18-2026, 07:37:26 GMT

Transformers have demonstrated impressive in-context learning (ICL) capabilities, raising the question of whether they can serve as metalearners that adapt to new tasks using only a small number of in-context examples, without any further training. While recent theoretical work has studied transformers' ability to perform ICL, most of these analyses do not address the formal metalearning setting, where the objective is to solve a collection of related tasks more efficiently than would be possible by solving each task individually. In this paper, we provide the first theoretical analysis showing that a simplified transformer architecture trained via gradient descent can act as a near-optimal metalearner in a linear classification setting. We consider a natural family of tasks where each task corresponds to a class-conditional Gaussian mixture model, with the mean vectors lying in a shared k-dimensional subspace of Rd. After training on a sufficient number of such tasks, we show that the transformer can generalize to a new task using only eO(k/eR4) in-context examples, where eR denotes the signal strength at test time. This performance (almost) matches that of an optimal learner that knows exactly the shared subspace and significantly outperforms any learner that only has access to the in-context data, which requires Ω(d/eR4) examples to generalize. Importantly, our bounds on the number of training tasks and examples per task needed to achieve this result are independent of the ambient dimension d.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback